Identification of Noun-Noun (N-N) Collocations as Multi-Word Expressions in Bengali Corpus
نویسنده
چکیده
Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world’s languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Bengali due to the lack of such tools and large corpora. This paper deals with the investigation of Noun-Noun bigram collocations from the medium-size untagged Bengali corpus of the articles of Rabindranath Tagore using simple unsupervised approach with various statistical evidences to show the affinity of the constituents of each bigram candidate as a proof of the Multi-Word Expression (MWE) and build a weighted measurement to get a distinction between MWE or non-MWE. We have mentioned different taxonomies of compound noun MWEs in Bengali based on morphosyntactic flexibility. We have also identified major Noun-Noun semantic collocations that are not MWEs. This initial approach for Bengali is promising in terms of the Precision, Recall and F-score.
منابع مشابه
Automatic Identification of Bengali Noun-Noun Compounds Using Random Forest
This paper presents a supervised machine learning approach that uses a machine learning algorithm called Random Forest for recognition of Bengali noun-noun compounds as multiword expression (MWE) from Bengali corpus. Our proposed approach to MWE recognition has two steps: (1) extraction of candidate multi-word expressions using Chunk information and various heuristic rules and (2) training the ...
متن کاملRelative Compositionality of Multi-word Expressions: A Study of Verb-Noun (V-N) Collocations
Recognition of Multi-word Expressions (MWEs) and their relative compositionality are crucial to Natural Language Processing. Various statistical techniques have been proposed to recognize MWEs. In this paper, we integrate all the existing statistical features and investigate a range of classifiers for their suitability for recognizing the non-compositional Verb-Noun (V-N) collocations. In the t...
متن کاملLexical and Grammatical Collocations in Writing Production of EFL Learners
Lewis (1993) recognized significance of word combinations including collocations by presenting lexical approach. Because of the crucial role of collocation in vocabulary acquisition, this research set out to evaluate the rate of collocations in Iranian EFL learners' writing production across L1 and L2. In addition, L1 interference with L2 collocational use in the learner' writing samples was st...
متن کاملJapanese Learners’dictionary of I-adjective-noun Collocations
This paper demonstrates a method for creating Japanese learners dictionary of i-adjective-noun collocations. After an introduction of the importance of collocations and the necessity of their inclusion in Japanese language learning, we present various corpora types and corpus query tools that are used to obtain variety of collocational usage in different types of discourse. The Japanese languag...
متن کاملCollocational Clashes in the Persian Translations of Tuesdays with Morrie
This study aimed at finding features of collocational deviations in the translations of Tuesdays with Mor- rie. In this direction, categories of collocations and collocational clashes, as well as causes of collocation- al clashes were explored. The present work investigated five Persian translations of the novel. All the books were examined completely and all possible collocational clashes were...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010